36 research outputs found
Adversarial Detection of Flash Malware: Limitations and Open Issues
During the past four years, Flash malware has become one of the most
insidious threats to detect, with almost 600 critical vulnerabilities targeting
Adobe Flash disclosed in the wild. Research has shown that machine learning can
be successfully used to detect Flash malware by leveraging static analysis to
extract information from the structure of the file or its bytecode. However,
the robustness of Flash malware detectors against well-crafted evasion attempts
- also known as adversarial examples - has never been investigated. In this
paper, we propose a security evaluation of a novel, representative Flash
detector that embeds a combination of the prominent, static features employed
by state-of-the-art tools. In particular, we discuss how to craft adversarial
Flash malware examples, showing that it suffices to manipulate the
corresponding source malware samples slightly to evade detection. We then
empirically demonstrate that popular defense techniques proposed to mitigate
evasion attempts, including re-training on adversarial examples, may not always
be sufficient to ensure robustness. We argue that this occurs when the feature
vectors extracted from adversarial examples become indistinguishable from those
of benign data, meaning that the given feature representation is intrinsically
vulnerable. In this respect, we are the first to formally define and
quantitatively characterize this vulnerability, highlighting when an attack can
be countered by solely improving the security of the learning algorithm, or
when it requires also considering additional features. We conclude the paper by
suggesting alternative research directions to improve the security of
learning-based Flash malware detectors
Is Deep Learning Safe for Robot Vision? Adversarial Examples against the iCub Humanoid
Deep neural networks have been widely adopted in recent years, exhibiting
impressive performances in several application domains. It has however been
shown that they can be fooled by adversarial examples, i.e., images altered by
a barely-perceivable adversarial noise, carefully crafted to mislead
classification. In this work, we aim to evaluate the extent to which
robot-vision systems embodying deep-learning algorithms are vulnerable to
adversarial examples, and propose a computationally efficient countermeasure to
mitigate this threat, based on rejecting classification of anomalous inputs. We
then provide a clearer understanding of the safety properties of deep networks
through an intuitive empirical analysis, showing that the mapping learned by
such networks essentially violates the smoothness assumption of learning
algorithms. We finally discuss the main limitations of this work, including the
creation of real-world adversarial examples, and sketch promising research
directions.Comment: Accepted for publication at the ICCV 2017 Workshop on Vision in
Practice on Autonomous Robots (ViPAR
Why Do Adversarial Attacks Transfer? Explaining Transferability of Evasion and Poisoning Attacks
Transferability captures the ability of an attack against a machine-learning
model to be effective against a different, potentially unknown, model.
Empirical evidence for transferability has been shown in previous work, but the
underlying reasons why an attack transfers or not are not yet well understood.
In this paper, we present a comprehensive analysis aimed to investigate the
transferability of both test-time evasion and training-time poisoning attacks.
We provide a unifying optimization framework for evasion and poisoning attacks,
and a formal definition of transferability of such attacks. We highlight two
main factors contributing to attack transferability: the intrinsic adversarial
vulnerability of the target model, and the complexity of the surrogate model
used to optimize the attack. Based on these insights, we define three metrics
that impact an attack's transferability. Interestingly, our results derived
from theoretical analysis hold for both evasion and poisoning attacks, and are
confirmed experimentally using a wide range of linear and non-linear
classifiers and datasets
BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability
Adversarial defenses protect machine learning models from adversarial
attacks, but are often tailored to one type of model or attack. The lack of
information on unknown potential attacks makes detecting adversarial examples
challenging. Additionally, attackers do not need to follow the rules made by
the defender. To address this problem, we take inspiration from the concept of
Applicability Domain in cheminformatics. Cheminformatics models struggle to
make accurate predictions because only a limited number of compounds are known
and available for training. Applicability Domain defines a domain based on the
known compounds and rejects any unknown compound that falls outside the domain.
Similarly, adversarial examples start as harmless inputs, but can be
manipulated to evade reliable classification by moving outside the domain of
the classifier. We are the first to identify the similarity between
Applicability Domain and adversarial detection. Instead of focusing on unknown
attacks, we focus on what is known, the training data. We propose a simple yet
robust triple-stage data-driven framework that checks the input globally and
locally, and confirms that they are coherent with the model's output. This
framework can be applied to any classification model and is not limited to
specific attacks. We demonstrate these three stages work as one unit,
effectively detecting various attacks, even for a white-box scenario
Stateful Detection of Adversarial Reprogramming
Adversarial reprogramming allows stealing computational resources by
repurposing machine learning models to perform a different task chosen by the
attacker. For example, a model trained to recognize images of animals can be
reprogrammed to recognize medical images by embedding an adversarial program in
the images provided as inputs. This attack can be perpetrated even if the
target model is a black box, supposed that the machine-learning model is
provided as a service and the attacker can query the model and collect its
outputs. So far, no defense has been demonstrated effective in this scenario.
We show for the first time that this attack is detectable using stateful
defenses, which store the queries made to the classifier and detect the
abnormal cases in which they are similar. Once a malicious query is detected,
the account of the user who made it can be blocked. Thus, the attacker must
create many accounts to perpetrate the attack. To decrease this number, the
attacker could create the adversarial program against a surrogate classifier
and then fine-tune it by making few queries to the target model. In this
scenario, the effectiveness of the stateful defense is reduced, but we show
that it is still effective
Indicators of Attack Failure: Debugging and Improving Optimization of Adversarial Examples
Evaluating robustness of machine-learning models to adversarial examples is a
challenging problem. Many defenses have been shown to provide a false sense of
security by causing gradient-based attacks to fail, and they have been broken
under more rigorous evaluations. Although guidelines and best practices have
been suggested to improve current adversarial robustness evaluations, the lack
of automatic testing and debugging tools makes it difficult to apply these
recommendations in a systematic manner. In this work, we overcome these
limitations by (i) defining a set of quantitative indicators which unveil
common failures in the optimization of gradient-based attacks, and (ii)
proposing specific mitigation strategies within a systematic evaluation
protocol. Our extensive experimental analysis shows that the proposed
indicators of failure can be used to visualize, debug and improve current
adversarial robustness evaluations, providing a first concrete step towards
automatizing and systematizing current adversarial robustness evaluations. Our
open-source code is available at:
https://github.com/pralab/IndicatorsOfAttackFailure